Method of viewing audiovisual documents on a receiver, and receiver for viewing such documents

ABSTRACT

The invention discloses a method of viewing audiovisual documents on a playback device, comprising a step for reading or receiving a document and a display step for viewing the images of this document on a screen. The method further comprises a step for reading or receiving an attribute associated with a time band of the document. A predetermined value of the attribute triggers on the device the enlargement of a part of the image on playback during said time band, because of which this image part occupies a larger area on screen.

FIELD OF THE INVENTION

The invention relates to a method of viewing audiovisual documents on areceiver, and a receiver for viewing such documents. The invention morespecifically relates to the processing of the document display.

BACKGROUND

These days a user can download an audiovisual document from abroadcaster to view it on his playback device. More recently, newminiaturized devices for playing back audiovisual transmissions haveappeared on the market. Thus, personal digital assistants (PDAs) or evenmobile telephones with color graphics screens, are provided withprograms and memories for storing audiovisual documents, such as films,documentaries, sporting events, and so on. When the part of interest toa user concerns a small part of the image, the miniaturization of thescreen makes it difficult for the user to follow the event.

Associating attributes with the image signals, whether digital oranalogue, is widely practiced; these attributes typically concern thetitle, theme or even the display format. For example, the DVB-SIspecification (Digital Video Broadcast—specification for ServiceInformation) EN 300 468 V1.3.1 (published by the ETSI) specifies theattribute function and format associated with audiovisual documents. Forexample, an attribute specifies whether the video signals of thedocument are of 4/3 or 16/9 type. The playback device takes account ofthis attribute and adapts the display according to its value, whichmakes it possible to make maximum use of the screen area by distortingthe image. However, in the case of devices with miniaturized screens,adapting the display to the screen size does not always make it possibleto follow the event in satisfactory conditions.

The document U.S. 2004/148640—MASUKURA teaches a device for receivingfrom a network and playing back an audiovisual document on a screen.Attributes, called “Metadata”, are received from the network inassociation with the document, this metadata defining regions of theimage that are to be enlarged so that they can be viewed morecomfortably by the user. The transmission of such information occupiesbandwidth and obliges the broadcaster to apply processing prior totransmission.

SUMMARY OF THE INVENTION

The invention proposes a method of viewing an audiovisual document on aplayback device provided with a display means, comprising a display stepfor viewing at normal size the images of this document on said displaymeans; wherein it comprises on a playback device a step for determiningan area of at least one image of said document according to the contentof the image, and

-   -   a step for enlarging said image area to view during a time band        said area with a larger size on the screen than in the image at        normal size.

In this way, the user can easily view actions of interest, even on asmall screen. Furthermore, the playback device itself determines thecharacteristics of the area to be enlarged according to the content ofthe image. In this way, the display is customized for each receivingdevice.

According to a first improvement, the device determines the geometriccharacteristics of the area to be enlarged. According to anotherimprovement, the device itself determines the enlargement ratio of thearea according to the characteristics of the display means, a mobiletelephone screen for example. In this way, the area to be enlarged istailored to the device.

According to another improvement, the image is analyzed to determine theposition of one or more particular objects, a ball surrounded by anumber of players, for example. The duly delimited area around saidobject will then be enlarged.

According to another improvement, enlargement values are recomputed atregular intervals, during document playback time bands. During a timeband, the enlargement step is triggered according to the associatedvalue. In this way, the device adapts the visual content to be enlargedas the document is played back.

According to an improvement, the enlargement concerns only a part of theimage appearing on the screen with a magnifying-glass effect. The userthus sees that a part of the image is enlarged, and can position itwithin the normal image.

The invention also proposes a device for viewing audiovisual documentscomprising a display means for viewing whole images of this document ona screen; wherein it further comprises a means of determining an area ofat least one image of said document according to the content of theimage, and an enlargement means for viewing, during a time band, saidarea with a larger size than in the image at normal size.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now become apparent with greater detail fromthe description that follows of embodiments given for illustrativepurposes and with reference to the appended figures in which:

FIG. 1 is a block diagram of an audiovisual receiver for implementingthe invention,

FIG. 2 is an example of image taken in close-up,

FIG. 3 is the appearance of the image displayed by a large screen withthe close-up of FIG. 2,

FIG. 4 is the appearance of the image displayed by a small screen, withthe close-up of FIG. 2,

FIG. 5 is an example of image taken in long shot,

FIG. 6 is the appearance of the image displayed by a large screen, withthe long shot of FIG. 5,

FIG. 7 is the appearance of the image displayed by a small screen, withthe long shot of FIG. 5,

FIG. 8 is the appearance of the image displayed by a small screen withthe long shot of FIG. 2, using the method of the invention,

FIG. 9 is an image displayed on a screen in which the central part hasbeen enlarged and framed,

FIG. 10 is an example of flow diagram of a program for generating imageshots and attribute values associated with these shots.

DETAILED DESCRIPTION

The operation of an audiovisual playback device 1 provided with adisplay device 2 is described first with the help of FIG. 1. This deviceis, for example, a set-top box, a personal computer, a PDA or a mobiletelephone device. In the last two examples, the display device 2 and thedevice are incorporated in the same device. The preferred exampledescribed in the rest of the document concerns a set-top box linked to aviewing screen. The device 1 includes a central processor unit 3 linkedto a program and working memory 12 (ROM and RAM respectively), and aninterface 5 for communication with a local high speed digital bus 6 usedto transmit audio/video data in real time. This network is, for example,an IEEE 1394 network. This digital bus is also used to send data to aserver. The device 1 can also receive audio/video data from atransmission network via a receive antenna associated with a demodulator4. The device 1 further comprises an infrared signal receiver 7 forreceiving signals from a remote control 8, a storage means 9 for storingaudiovisual documents, and an audio/video decoding logic 10 forgenerating the audiovisual signals sent to the television screen 2. Ifthe audiovisual transmissions are digital, the storage means 9 ispreferably a hard disk (HDD), but it can also be a video recorderreading magnetic tapes or a recordable optical disk reader/recorder(DVD-RAM).

The device 1 also has a clock (not shown) for calculating the viewingtime of a document.

The device 1 also comprises a circuit 11 for displaying data on thescreen, often called on-screen display (OSD) circuit. The OSD circuit 11is a text and graphics generator which is used to display on screenmenus, pictograms (for example, a number corresponding to the channelbeing viewed), or which can be used to mix two audiovisual contents and,if necessary, enlarge one of them. The OSD circuit is mainly controlledby the central processor unit 3 associated with an executable modulecalled a “Display Management Module” (DMM). The DMM module typicallycomprises a program module resident in read-only memory 12 andparameters stored in working memory. The DMM module can also be producedin the form of a custom circuit, of ASIC type for example.

The audiovisual documents transmitted in digital form are transmitted inpackets referenced and stored in the memory 9. Some packets containimages and sound and others contain attributes of the audiovisualdocument. According to an exemplary embodiment, the documents have anattribute, the value of which specifies a particular type of view.According to a simplified embodiment, the recognized types envisaged forthe time being are:

-   -   Long shot    -   Close-up.

Most of the image of a close-up is presumed to contain the interestingpart, for example, the camera zooms in on two football players during amatch. Whereas in a long shot, for example, when all of the pitch of thestadium is filmed, only a small part—the vicinity of the ball—isactually of interest to the user. According to the method that is thesubject of the present invention, the attribute indicating the type ofshot, sometimes called “metadata”, defines the type of video displayrequired on certain types of viewing device. More specifically, theattribute indicates whether the image (or a part) viewed on certainscreens should or should not be enlarged.

According to a particularly simple exemplary embodiment, the value ofthe attribute is updated at camera level. The long shot or close-upadjustment made by the cameraman determines the value of the attribute.This simplified mode does not preclude the possibility of the attributebeing updated after the shots have been taken by the broadcasterhimself. Thus, the attribute is associated with the transmitteddocument. For a direct broadcast, the attribute is sent via a dedicatedchannel in an MPEG-2 stream, the channel corresponding to a header (orPID) linked to the complete program in the program tables. Finally, theattribute can be determined by analysis of the image in the playbackdevice 1.

Depending on the value of the attribute, the viewing deviceautomatically selects a display mode: either the image appears fullscreen, or the image is enlarged to display only the interesting part.The table below summarizes the various cases that can arise: Devicetype/ Shot type Widescreen device Small screen device Long shot Videoretained Video enlarged Close-up Video retained Video retained

If the video is retained, the images received are not modified ondisplay, in other words: the entire image appears on the screen. If thescreen of the display device 2 is of large size, a television, plasma orLCD screen, or even the image produced by a video projector, there is noneed to enlarge any part of the image, because even a small part appearslegible enough.

For long shots, only the devices designed to display conventional videoscan display legible information. In the case of small displays, anenlargement of the images or a part of the latter is necessary to makethe images legible. For example: on a long shot, each player in afootball match or rugby match occupies a small area of the images, andthe ball is even smaller. A mobile telephone has to apply subsampling ofthe pixels for each image in order to display the latter on its smallscreen, so there is a risk of the ball disappearing and thereforeadversely affecting the understanding of the event.

In the figures and examples that follow, a football match is used as adocument, but, obviously, any audiovisual document can be applied to thepresent invention: tennis matches, documentaries, films, etc. FIG. 2 isan example of an image taken in close-up. It contains two players arounda football. FIG. 3 shows how the close-up of FIG. 2 appears whendisplayed by a large screen. The players appear clearly. FIG. 4 showshow the close-up of FIG. 2 appears when displayed on a small screen. Theplayers again appear clearly, and the user can follow their action.Advantageously, the device can use an anti-aliasing filter. Such afilter, applied either to all of the image displayed, or only to thecontours detected in the image, is used to smooth or soften strongtransitions.

FIG. 5 is an example of image taken in long shot. Fifteen or so playersare moving on a football pitch, and the football can be distinguished inthe middle of the screen. FIG. 6 shows how the long shot of FIG. 5appears when displayed on a large screen. The user can distinguish theplayers and the football without problems. FIG. 7 shows how the longshot of FIG. 5 appears when displayed on a small screen. The user cannow almost not make out the players and can no longer follow the ballsince it is so small on the display. The method that is the subject ofthe present invention will, if necessary, enlarge the image according tothe type of playback device so that the image displayed is again ofinterest to the user. FIG. 8 shows how the long shot of FIG. 5 appearswhen displayed by a small screen after processing by the DMM module. Theimage displayed concerns only the action in the middle of the pitch, andfour players can easily be seen and the football can be distinguished.

There now follows a description of how the playback device provided withreduced display capability selects the part of the image to be enlarged.

According to a preferred and particularly simple embodiment of thepresent invention, it can be considered that the interesting part of theaction is located in the middle of the image. Because of this, when theattribute indicates that the shot is long, the DMM module sends acommand to the OSD circuit 11, consisting in enlarging the central partof the image. According to an improvement, the attribute transmittedand/or read specifies the enlargement value of the interesting area. Asa variant of this improvement, this value is defined by the user duringa parameter programming step. For example, the user displays aprogramming submenu and enters the enlargement value with his remotecontrol 8, or any other command entry means, whether incorporated or notin the device 1.

According to an improvement, the value of the attribute varies accordingto the image displayed, and therefore while the document is being playedback. For example, to return to the example of a football match, theimage sometimes concerns long shots of the pitch and sometimes close-upson the players. Because of this, the attribute comprises a series ofdata pairings; the first data item specifies a time band of the documentand the second is the value of the attribute during this time window.The time window is based from the start of the document. The clock ofthe playback device is used to provide the time reference. The tablebelow illustrates the shot breakdown of a document, each shot having itsown attribute value: Shot number Time band Attribute value 1 0 h00′00″-0 h 01′23″ L 2 0 h 01′24″-0 h 01′53″ C 3 0 h 01′54″-0 h 03′11″ L4 0 h 03′12″-0 h 01′45″ C

In which “L” indicates that the shot is long and “C” that the shot isclose-up.

If the invention is used for a direct transmission of an event, thefirst data item specifies only the starting value of the time band,associated with the value of the L or C attribute. This value willremain valid until the start of the next band, when the attribute valuechanges.

The above table then becomes: Shot Attribute number Time band value 1 0h 00′00″ L 2 0 h 01′24″ C 3 0 h 01′54″ L 4 0 h 03′12″ C

If the attribute can take only two mutually exclusive values (forexample, when only the long and close-up shots are detected and used),the transmission of the starting values of the time bands is enoughprovided that the first attribute value for the first band has beencommunicated.

The above table then becomes: Shot Attribute number Time band value 1 0h 00′00″ L 2 0 h 01′24″ 3 0 h 01′54″ 4 0 h 03′12″

According to another improvement, the interesting part of the screen isspecified in another field of the attribute. This part is a rectangleand the duly delimited interesting area is characterized by the centreof the rectangle. The device 1 computes the sampling in order todetermine the final image viewed about this central point and thereforethe enlargement applied to the interesting area.

This improvement can easily be combined with the preceding one, suchthat the table defining the attribute is enhanced as follows: ShotAttribute Centre of number Time band value interesting area 1 0h00′00″ L(x₁, Y₁) 2 0h01′24″ C (x₂, Y₂) 3 0h01′54″ L (x₃, _(Y) ₃₎ 4 0h03′12″ C(x₄, Y₄)

The data in the table can be transmitted and stored in the form ofmetadata directly associated with the stream in an MXF schema. Accordingto an improvement, the attribute contains, for each time band, a set ofpairs associating a point of the image defining the centre of theinteresting area and a topic, a theme for example, or a subject. This isused to differentiate certain parts of the image according to theirtopic. For example, when transmitting a cycle race, one and the sameimage can show the leading rider the following pack, the surroundinglandscapes and so on. A user may want to view only one of these imageportions in enlarged format. For this, the device compares a keywordentered by the user (for example: “first”, “pack”, “environment”), andcompares it with the key word or words associated with a data set. Ifthe words correspond (either literally, or because they are synonymous),then the image corresponding to this key word is enlarged.

According to an embodiment variant, the DMM module controls the OSD sothat a part of the image played back is enlarged. The image thencomprises an enlarged part containing the most interesting part of theaudiovisual document, the part of the image surrounding this enlargedpart remaining at a normal size. This variant can be used when theattribute specifies an interesting area and a value for enlarging thisarea, and, once enlarged, this area occupies only a part of the screen.Advantageously, the video of the frame surrounding the enlarged part isdistorted to reinforce the fact that the enlarged image appears with amagnifying-glass effect. FIG. 9 shows a composite image in which thecentral part has been enlarged and framed.

The size of interesting area around the point specified in the attributecan be entered manually using a menu and remote control buttons. Thisuser interface is incorporated in the DMM module

According to another variant, the characteristics of the interestingpart of the image (position, size) are determined by the playback device1, by using an image recognition engine. For example, in a footballmatch, the action is always located around the football. The imagerecognition engine analyses the video data and searches for a movingobject having the visual characteristics of a football. The enginegenerates a video area in which the football is at the centre andcommunicates this area to the DMM module which is responsible forenlarging it as appropriate. Such an engine can be used to analyze allsporting documents in which the players use an object as the stake ofthe game (football, rugby ball, tennis ball, etc.).

Another way of determining an interesting area within the playbackdevice involves analyzing the activity of the elements of the image; inpractice, the content of this area normally contains a more importantactivity than the rest of the image. This activity measurement is basedon a prior measurement of the movement of the camera between the currentimage and the preceding one. Then, the movement between each pixel ofthe current image and each pixel of the preceding image is computed bysubtracting the movement characteristic linked to the movement of thecamera computed previously. The result is then, for each pixel, thevalue of the intrinsic movement of the object. The barycentre of eachpixel is taken as the centre of the area of interest.

According to this variant, the DMM module determines the area to beenlarged during predetermined time bands. Determination can be performedat regular intervals, every second, for example, so defining theduration of the time band. In this way, the device constantly determineswhether the image received contains an area that requires enlargementtaking Into account the display means and, where appropriate, parametersentered by the user. Another method consists in detecting breaks betweenthe sequences and running a step for determining the area to be enlargedonly at the time of the breaks. In practice, when the shot is close-up,there is no a priori need to enlarge the image, but when the imageappears in long shot, it is necessary to determine whether anenlargement is necessary for a good legibility on the display means.

In a more sophisticated way, the DMM module incorporated in the playbackdevice contains a program represented by the flow diagram illustrated inFIG. 10. This program constitutes a method of generating shots andattributes values associated with these shots. The first step of thisprogram consists in a pre-processing to divide the video into shots, ashot being characterized by a shot unit. A luminance histogram iscomputed from a subsampled version of each image of the video sequence.For interleaved video, each frame is processed in turn. For compressedvideo, the low-resolution images are obtained directly from the DC(direct current) coefficients that can originate, for example, from aDCT (direct cosine transform) breakdown. The histograms are thenfiltered by a low-resolution filter with finite impulse response. Thedistance between histograms used is defined as the accumulation of thedifferences from class to class. This distance is computed on successiveimages or frames. A shot break is then detected when the distancedefined previously is greater than a given threshold.

The second step of the flow diagram of FIG. 10 consists in extractingthe attribute values (close-up shot or long shot) of each part of theduly broken down document. For this, the DMM module extracts thefollowing characteristics:

-   -   the average number of pixels in the shot associated with the        dominant color in the sequence,    -   the average activity in the shot,    -   the movement of the camera.

The first characteristic requires the prior computation of the dominantcolor in the sequence. The objective is to obtain the values in the RGBcolor space (or other: HSV, HLS, etc.) of the green color of the pitchfor an initial system calibration. The computation of the dominant coloris based on a conventional clustering algorithm (k-means type). It canbe done automatically on a subset of images extracted from the start ofthe video or from a reference image supplied by a user (this image willbe a long shot image of the pitch). Once the color of the pitch has beencharacterized, a distance measurement between the dominant color of eachshot and the color of the pitch is chosen. This distance can be a simpleEuclidian distance in the chosen color space or a more complex distancetaking into account the number of occurrences of this color in the shot.Patent application WO 03/03203 filed by the applicant and published onMay 13, 2004 describes a method of computing such a distance.

The average activity in the shot is the average of the norm of themovement vectors between images in the shot. These vectors can beobtained directly from the MPEG streams or recomputed in the case ofvideo in another format. The distance model between the activities oftwo shots can be taken as being the absolute difference value.

The movement of the camera can also be estimated from the MPEG movementvectors or from recomputed vectors. A very simple, simplified lineartype movement model with three parameters is sufficient. These threeparameters characterize the horizontal and vertical motions and thezoom. The movement model is estimated robustly, that is by takingaccount only of the pixels associated with the dominant movement. Patentapplication WO 02/04316 filed by the applicant and published on Jun. 20,2003 describes a computation method based on the least mean of squares.The distance between the camera movements of two shots can, for example,be taken as a Euclidian distance between the two models with threeparameters, this distance possibly being weighted by the ratio betweenthe numbers of pixels associated with the dominant movement in eachshot.

There then follows a classification step consisting in separating thelong shots from the close-up shots and other shots. For this, the threeaforementioned characteristic attributes are extracted for each shotconsidered and an overall distance measurement is used. This overalldistance measurement will be taken, for example, as the weighted sum ofthe three distances described previously. Two classification methods canbe used, these two methods being applied to two different contexts:

-   -   a posteriori classification of the shots,    -   on-line classification of shots for direct transmission of the        shots.

The a posteriori classification of the shots can be performed entirelyautomatically. It consists in separating all of the shots into threesubgroups (long shots, close-up shots and other shots), all of the shotsand their characteristics being known. A three-class k-means type methodcan be used here. This method is best in terms of result but itsapplicability is, however, limited in the context of direct transmissionwith which we are concerned. The flow diagram of FIG. 10 is also partlydescribed in the aforementioned patent application WO 02/04316.

A second method, this time semi-automatic, can also be proposed. Itconsists firstly in selecting an example of long shot and close-up shotat the start of the video. All the characteristics of these referenceshots are then computed.

The classification can then be performed on line by computing thedistance between the current shot and these reference shots, then bycomparing this difference with a predetermined threshold value in orderto obtain a classification in long shots/close-up shots or other shotsaccording to the following algorithm:

If only the difference between the current shot and the long shot isless than the threshold, then the current shot is a long shot.

If only the difference between the current shot and the close-up shot isless than the threshold, then the current shot is a close-up shot.

If the two differences are less than the threshold, then the smallerdifference indicates the classification of the shot.

Otherwise the shot is another shot.

For an optimal result, the attributes must be computed from all theimages of the shot. The drawback of this computation method lies in thefact that a variable delay is introduced, the delay corresponding to theduration of the current shot. An alternative, non-optimal implementationcan, however be used. In this case, only the first N images of the shotare considered. The implicit assumption here involves assuming that thethree characteristics of the shot are stable over a given shot.

It is also possible to take into account any changes of the referencecharacteristics by recomputing the reference characteristics after eachshots classification. For example, if a shot is classified as long, thecharacteristics of the long shot will be recomputed as the average ofthe characteristics of the preceding long shots and the new long shot.

The exemplary embodiments of the invention described above have beenchosen for their concrete nature. It would not, however, be possible tolist all the possible embodiments of this invention. In particular, anystep or any means described can be replaced by an equivalent step ormeans without departing from the framework of the present invention.

1. Method of viewing an audiovisual document provided to a displaymeans, comprising steps of: displaying at normal size images of saiddocument on said display means; determining an area of at least oneimage of said document according to content of the image; and enlargingsaid image area during a time band such that said area is displayed at alarger size than said normal size.
 2. The method of viewing according toclaim 1; wherein geometric characteristics of the area are determinedduring said determination step.
 3. The method of viewing according toclaim 1; wherein an enlargement ratio of the area, according to thecharacteristics of the display means, is determined during saiddetermination step.
 4. The method of viewing according to claim 1;wherein a user entered enlargement ratio is used for enlarging saidimage area.
 5. The method of viewing according to claim 1; wherein saidimage is analyzed to determine the position of at least onecharacteristic objects, and in that said at least one object is detectedas to be located in the image area to be enlarged.
 6. The method ofviewing according to claim 1; wherein said step of determining an areato be enlarged is repeated during time bands when said document isplayed back; and during such time bands when said document is beingplayed back, the enlargement step defined according to the dulydetermined area.
 7. The method of viewing according to claim 1; whereinthe enlargement step concerns only a part of said image appearing on thescreen, where the enlargement is performed with a magnifying-glassenlarging effect.
 8. A device for viewing audiovisual documentscomprising: a display means for viewing whole images of said document ona screen; a means for determining an area of at least one image of saiddocument according to content of said at least one image; and an meansfor enlarging that enlarges said at least one area in a larder size thannormally displayed during a time band.
 9. A viewing device according toclaim 8, wherein that the means for determining defines geometriccharacteristics of the at least one area to be enlarged.
 10. A viewingdevice according to claim 8, wherein that the means for determinedefines an enlargement ratio of said at least one the area according tocharacteristics of said display means.
 11. A viewing device according toclaim additionally comprising a means for entering an enlargement value,and wherein said means for determining takes into account theenlargement value entered in for determining said at least one imagearea to be enlarged.
 12. A viewing device according to claim 8,additionally comprising: a means of detecting at least onecharacteristic object in an image of said document, said means fordetermin defining defines the image to be enlarged so that said at leastone characteristic object is detected to be located in the image area tobe enlarged.
 13. A viewing device according to claim 8, wherein themeans for determining is activated at regular intervals.
 14. A viewingdevice according to claim 8, wherein said audiovisual document is madeup of sequences; and said means for determining is activated on changesof sequences.
 15. A viewing device according to claim 8, wherein thedisplay means enlarges said at least one area determined in a portion ofthe screen in order to show said area with a magnifying-glassenlargement effect.